[not merge] xpu glm test by zhupengyang · Pull Request #7748 · PaddlePaddle/FastDeploy

zhupengyang · 2026-05-08T03:00:41Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-08T03:00:48Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-08T03:15:33Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-14 00:48:15

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 240c808
Merge base: 427d0f5 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有 Required 任务均已通过（本 PR 无 Required 任务），有 1 个可选任务失败（不阻塞合并）。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
2(0)	2	1	1	0	0	0

2 任务状态汇总

2.1 Required任务 : 0/0 通过

本 PR 无 Required 任务（分支保护规则未配置必选 CI）。

2.2 可选任务 — 1/2 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Trigger Jenkins for PR`	1m1s	Job	-
✅	其余 1 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-14 00:44:40

📋 Review 摘要

PR 概述：修复 XPU 昆仑芯 GLM 推理相关问题，同时优化 KVCache 写入/预取健壮性、修复 local_scheduler bug、新增 splitwise interrupt 命令支持

变更范围：cache_manager/、scheduler/local_scheduler.py、model_executor/layers/sample/sampler.py、worker/xpu_model_runner.py、splitwise/

影响面 Tag：[KVCache] [Scheduler] [XPU] [PD Disaggregation] [OP]

📝 PR 规范检查

标题 [not merge] xpu glm test 不含任何官方 Tag；PR 描述所有段落均为空占位符，不合规。

标题建议（可直接复制）：

[BugFix][XPU] Fix XPU sampling params, local scheduler recycle bug, and KVCache storage robustness

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
修复 XPU（昆仑芯）上 GLM 模型推理时遇到的若干问题，包括采样参数越界、local scheduler 回收请求时的 IndexError/cursor 错误、KVCache storage 写入超时与预取失败的健壮性，以及新增 splitwise interrupt_requests 控制指令支持。

## Modifications
- `fastdeploy/model_executor/layers/sample/sampler.py`：XPU 平台使用 32-bit MAX_INFER_SEED（2147483646）；调整 decoder offset 乘数为 32
- `fastdeploy/scheduler/local_scheduler.py`：修复 `_recycle` 中 `ids.index` 可能抛出 ValueError、cursor 无条件递减的 bug；修复过期 ID 批量移除使用错误 index 的 bug
- `fastdeploy/cache_manager/prefix_cache_manager.py`：GPU blocks 不足时改为 warning + 跳过 storage 预取；prefetch 路径增加 try/except 防护；storage 写入 token_ids 截断至实际块大小
- `fastdeploy/cache_manager/cache_transfer_manager.py`：将 `flush_token_index` 调用从 `write_back_storage_task` finally 块移至 `_run_write_back_storage` 开头
- `fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py`：将批量写入改为分片（slice）写入，支持总超时与分片超时控制
- `fastdeploy/worker/xpu_model_runner.py`：`ids_remove_padding` 为空时提前返回 None
- `fastdeploy/splitwise/internal_adapter_utils.py`：新增 `interrupt_requests` 控制命令处理

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🔴 Bug	`fastdeploy/model_executor/layers/sample/sampler.py:103`	`local_pos * 32` 未加 XPU 平台守卫，影响所有硬件采样行为
🔴 Bug	`fastdeploy/cache_manager/prefix_cache_manager.py:1143`	删除 `enable_output_caching` 条件判断，output tokens 无条件写入 storage cache
🟡 建议	`fastdeploy/scheduler/local_scheduler.py:157`	A4 多实现同步：`_recycle` bug 修复是否已同步到 global/dp/splitwise scheduler
❓ 疑问	`fastdeploy/cache_manager/cache_transfer_manager.py:934`	`elif attention_store` 分支内重复判断 `storage_backend_type == "attention_store"` 冗余

总体评价

PR 涉及多个模块的重要修复，整体方向正确。但两处 P0 问题（XPU 采样 offset 未加平台守卫影响全平台、output caching 行为破坏性变更）需在合入前明确处理。鉴于标题标注 [not merge]，建议在完善上述问题后再提正式 PR。

PaddlePaddle-bot · 2026-05-13T16:49:01Z

    offsets = paddle.where(
        is_decoder,
-        local_pos * 4,
+        local_pos * 32,


🔴 Bug local_pos * 32 未加 XPU 平台守卫，对所有硬件生效。

同一函数中 MAX_INFER_SEED 的修改已正确使用 if current_platform.is_xpu() 守卫，但此处 offset 乘数从 4 改为 32 没有任何平台判断，会改变 GPU / DCU 等硬件上的采样随机性行为。

建议修复：

if current_platform.is_xpu(): offsets = paddle.where( is_decoder, local_pos * 32, paddle.zeros_like(local_pos), ) else: offsets = paddle.where( is_decoder, local_pos * 4, paddle.zeros_like(local_pos), )

如果 32 是全平台的正确值（4 是历史 bug），请在 PR 描述中说明原因。

PaddlePaddle-bot · 2026-05-13T16:49:01Z

-            input_token_ids = token_ids + request.output_token_ids
-        else:
-            input_token_ids = token_ids
+        input_token_ids = token_ids + request.output_token_ids


🔴 Bug 删除 enable_output_caching 条件判断后，output tokens 会无条件写入 storage cache，破坏不启用 output caching 用户的预期行为。

原代码含义：只有在 enable_output_caching=True 时才把 output token ids 附加到存储 key 计算中；设为 False 时仅用 input tokens 作为存储键。删除该判断后，所有用户都会把 output block 写入 storage，可能导致：

预取时（只有 input tokens）永远无法命中这些带 output 后缀的存储块，造成无效写入

对不希望缓存 output 的场景（安全/隐私需求）产生行为破坏

建议恢复原有条件判断，或在 PR 中明确说明此变更的意图。

PaddlePaddle-bot · 2026-05-13T16:49:01Z

            expired_ids.append(request.request_id)

-        for i, expired_id in enumerate(expired_ids):
+        for expired_id in expired_ids:


🟡 建议 A4 多实现同步检查：_recycle 中修复了以下 bug：

ids.index(request_id) 可能抛出 ValueError（现已捕获）

ids_read_cursor 无条件递减（现已改为 if idx < self.ids_read_cursor）

self.ids.pop(i) 使用枚举索引而非实际位置（现已改为 self.ids = self.ids[len(expired_ids):]）

请确认 global_scheduler.py / dp_scheduler.py / splitwise_scheduler.py 中是否存在相同模式，必要时同步修复，避免其他 scheduler 实现出现相同问题。

PaddlePaddle-bot · 2026-05-13T16:49:01Z


            elif self.storage_backend_type == "attention_store":
+                try:
+                    if (self.rank == 0) and self.storage_backend_type == "attention_store":


❓ 疑问 此 if 条件中的 self.storage_backend_type == "attention_store" 是冗余判断——代码已处于外层 elif self.storage_backend_type == "attention_store": 分支内，该条件必然为真。

建议简化为：

if self.rank == 0: self.storage_backend.flush_token_index(task_id, token_ids, 0, False)

zhupengyang added 5 commits April 29, 2026 06:34

merge pr7614

702df8f

merge pr7628

721ddd5

merge pr7622

fb4faef

merge pr7648

64adde6

merge pr7445

2c13f26

zhupengyang had a problem deploying to Metax_ci May 8, 2026 03:00 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

zhupengyang had a problem deploying to Metax_ci May 13, 2026 12:38 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

merge pr7805

240c808

zhupengyang force-pushed the glm_docker_1_merge branch from 696c9a5 to 240c808 Compare May 13, 2026 16:26

zhupengyang had a problem deploying to Metax_ci May 13, 2026 16:26 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[not merge] xpu glm test#7748

[not merge] xpu glm test#7748
zhupengyang wants to merge 6 commits into
PaddlePaddle:developfrom
zhupengyang:glm_docker_1_merge

zhupengyang commented May 8, 2026

Uh oh!

paddle-bot Bot commented May 8, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 8, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 13, 2026

Uh oh!

PaddlePaddle-bot May 13, 2026

Uh oh!

PaddlePaddle-bot May 13, 2026

Uh oh!

PaddlePaddle-bot May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhupengyang commented May 8, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 8, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 0/0 通过

2.2 可选任务 — 1/2 通过

3 失败详情（仅 required）

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PaddlePaddle-bot commented May 8, 2026 •

edited

Loading